Detecting the large entries of a sparse covariance matrix in sub-quadratic time
نویسندگان
چکیده
The covariance matrix of a p-dimensional random variable is a fundamental quantity in data analysis. Given n i.i.d. observations, it is typically estimated by the sample covariance matrix, at a computational cost of O(np2) operations. When n, p are large, this computation may be prohibitively slow. Moreover, in several contemporary applications, the population matrix is approximately sparse, and only its few large entries are of interest. This raises the following question: Assuming approximate sparsity of the covariance matrix, can its large entries be detected much faster, say in sub-quadratic time, without explicitly computing all its p2 entries? In this paper, we present and theoretically analyze two randomized algorithms that detect the large entries of an approximately sparse sample covariance matrix using only O(np poly log p) operations. Furthermore, assuming sparsity of the population matrix, we derive sufficient conditions on the underlying random variable and on the number of samples n, for the sample covariance matrix to satisfy our approximate sparsity requirements. Finally, we illustrate the performance of our algorithms via several simulations.
منابع مشابه
Sparse Hanson-Wright inequalities for subgaussian quadratic forms
In this paper, we provide a proof for the Hanson-Wright inequalities for sparse quadratic forms in subgaussian random variables. This provides useful concentration inequalities for sparse subgaussian random vectors in two ways. Let X = (X1, . . . , Xm) ∈ R be a random vector with independent subgaussian components, and ξ = (ξ1, . . . , ξm) ∈ {0, 1} be independent Bernoulli random variables. We ...
متن کاملChained Vector Simplex
An algorithm for solving linear programming problems whose matrix of coefficients contains a large number of "zero" entries is studied. This algorithm is more useful when it is generated as a sub-program in a real-time program. The singly linked lists for storing only the non-zero entries of the coefficients matrix is used. The modified Revised Simplex Method is also used for solving such probl...
متن کاملA Well-Conditioned and Sparse Estimation of Covariance and Inverse Covariance Matrices Using a Joint Penalty
We develop a method for estimating well-conditioned and sparse covariance and inverse covariance matrices from a sample of vectors drawn from a sub-Gaussian distribution in high dimensional setting. The proposed estimators are obtained by minimizing the quadratic loss function and joint penalty of `1 norm and variance of its eigenvalues. In contrast to some of the existing methods of covariance...
متن کاملCovariance Matrix Estimation for Stationary Time Series
We obtain a sharp convergence rate for banded covariance matrix estimates of stationary processes. A precise order of magnitude is derived for spectral radius of sample covariance matrices. We also consider a thresholded covariance matrix estimator that can better characterize sparsity if the true covariance matrix is sparse. As our main tool, we implement Toeplitz [Math. Ann. 70 (1911) 351–376...
متن کاملJPEN Estimation of Covariance and Inverse Covariance Matrix A Well-Conditioned and Sparse Estimation of Covariance and Inverse Covariance Matrices Using a Joint Penalty
We develop a method for estimating well-conditioned and sparse covariance and inverse covariance matrices from a sample of vectors drawn from a sub-gaussian distribution in high dimensional setting. The proposed estimators are obtained by minimizing the quadratic loss function and joint penalty of `1 norm and variance of its eigenvalues. In contrast to some of the existing methods of covariance...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1505.03001 شماره
صفحات -
تاریخ انتشار 2015